Skip to content

370 - vm_clone produces VMs that are silently fragile to multi-disk attach#371

Open
domendobnikar wants to merge 3 commits into
mainfrom
370-vm_clone-produces-vms-that-are-silently-fragile-to-multi-disk-attach-reboot-no-bootdevices-vda-anchored-grub
Open

370 - vm_clone produces VMs that are silently fragile to multi-disk attach#371
domendobnikar wants to merge 3 commits into
mainfrom
370-vm_clone-produces-vms-that-are-silently-fragile-to-multi-disk-attach-reboot-no-bootdevices-vda-anchored-grub

Conversation

@domendobnikar

@domendobnikar domendobnikar commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Boot order is now set after cloning to avoid issues when attaching additional disk to VM afterwards.
Cloud init's user data default runcmd is also added if cloud init has been passed as parameter.

@domendobnikar domendobnikar self-assigned this Jun 3, 2026
@domendobnikar domendobnikar force-pushed the 370-vm_clone-produces-vms-that-are-silently-fragile-to-multi-disk-attach-reboot-no-bootdevices-vda-anchored-grub branch from 6dbb575 to 689fee5 Compare June 3, 2026 14:36
@domendobnikar domendobnikar requested a review from Copilot June 3, 2026 14:37

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to address Issue-370 where VMs created via vm_clone can become fragile when additional disks are attached, by ensuring a boot device order is set after cloning. It also introduces a helper for identifying a “primary” disk and modifies the module’s cloud-init defaults.

Changes:

  • After a successful clone, fetch the cloned VM and attempt to set its boot device order.
  • Add VM.get_primary_disk() helper to select a primary disk.
  • Set a default cloud_init.user_data for vm_clone.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
plugins/modules/vm_clone.py Adds post-clone boot order setting logic; changes cloud-init argument defaults.
plugins/module_utils/vm.py Fixes a disk comment typo and adds a helper to select a “primary” disk.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread plugins/modules/vm_clone.py
Comment thread plugins/modules/vm_clone.py Outdated
Comment thread plugins/module_utils/vm.py Outdated
@domendobnikar domendobnikar force-pushed the 370-vm_clone-produces-vms-that-are-silently-fragile-to-multi-disk-attach-reboot-no-bootdevices-vda-anchored-grub branch 11 times, most recently from acc3fdc to 67eda0b Compare June 4, 2026 14:32
@domendobnikar domendobnikar requested a review from justinc1 June 4, 2026 14:35
@domendobnikar domendobnikar force-pushed the 370-vm_clone-produces-vms-that-are-silently-fragile-to-multi-disk-attach-reboot-no-bootdevices-vda-anchored-grub branch 5 times, most recently from 26cb659 to 06d8f76 Compare June 5, 2026 13:19
    - cloud init's user data not has a default value
    - boot devices by default set as the primary Virtio disk
@domendobnikar domendobnikar force-pushed the 370-vm_clone-produces-vms-that-are-silently-fragile-to-multi-disk-attach-reboot-no-bootdevices-vda-anchored-grub branch 2 times, most recently from 34a4622 to fff6ef4 Compare June 5, 2026 16:12
@domendobnikar

Copy link
Copy Markdown
Collaborator Author

@domendobnikar domendobnikar force-pushed the 370-vm_clone-produces-vms-that-are-silently-fragile-to-multi-disk-attach-reboot-no-bootdevices-vda-anchored-grub branch from 2d57110 to f2fa4b5 Compare June 5, 2026 20:27
@domendobnikar domendobnikar force-pushed the 370-vm_clone-produces-vms-that-are-silently-fragile-to-multi-disk-attach-reboot-no-bootdevices-vda-anchored-grub branch from e4dcabc to 6f42ae3 Compare June 5, 2026 21:22

@justinc1 justinc1 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think PR solves exactly the problem described in issue (e.g. virtio disk etc), but also creates additional corner cases.

I think we need to investigate what can be done. What if sorce VM has IDE or SCSI disk, a mixture, a NIC boot device etc.

I'm not sure we will be able to fully solve the problem. I tried to create a simple emtpy VM, with 1 disk, in HC3 web UI. The VM got beside the disk also a NIC, (empty) cloud-init CD-ROM, and scale-guest-tools-4.2.iso CD-ROM. The disk and (empty) cloud-init CD-ROM were set as boot devices by HC3. In this case it seems like source VM boot order could be used as source of truth. But I did not check all potential ways to create a source VM.


user_data = cloud_init.get("user_data")
if not user_data:
return cloud_init

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to fix grub config in this case?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure, this is from the task description: "If the module accepts cloud_init.user_data, the generated cloud-init should include a runcmd to fix the grub UUID issue at first boot".
I read it as "if the the user_data was passed"

user_data.rstrip()
+ """

runcmd:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might end adding a user_data section when we already have a user_data section.
We can add a unit tests for those corner cases, then refaction this function a bit.
Corner cases:

  • no user_data
  • emtpy user_data
  • no user_data.runcmd
  • emtpy user_data.runcmd
  • user_data.runcmd with entries

I guess we want to always comment out GRUB_DISABLE_LINUX_UUID ?

hypercore_tags.append(tag)
data["template"]["tags"] = ",".join(hypercore_tags)
if cloud_init:
cloud_init = cls.clone_add_user_data_to_cloud_init(cloud_init)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have impression we want to alway run this?

- source_info.records.0.nics.0.mac != cloned_info.records.0.nics.0.mac
- source_info.records.0.node_affinity == cloned_info.records.0.node_affinity
# Cloned VM's boot devices should be set
- cloned_info.records.0.boot_devices | length != 0

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to test with "==".

I guess source_info.records.0.boot_devices | length == 0?

And cloned_info.records.0.boot_devices | length was 0 before, 1 after PR?

return disk

# primary disk is the largest Virtio disk
def get_primary_disk(self):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could clone a VM with IDE or SCSI disk(s).

Not sure why is the largest disk the bootable one. VM with 100 kB cloud-init ISO, 20 GB OS disk, 100 GB installed-apps disk. Maybe the first disk on bus is the OS disk, but if we add/remove disk, we likely can change that too. :( :(

Ideally we could ask HC3 about this setting for the source VM. But source VM might never be booted before clonning.

Or, we could clone VM, boot it, then assuming cloned VM does boot (really boot, not just wait on
"no OS installed") set this from cloned VM runtime state.

A NIC could be a boot device too.

@domendobnikar domendobnikar Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a cloud-image VM cloned via this module boots correctly for years while it has exactly one virtio disk, then bricks on the first reboot after any tool (CSI driver, manual disk-add, Terraform provider, or even another playbook in the same suite) attaches a second virtio disk.
From my understanding this issue is a corner case to solve the "one virtio disk already exists and is used for boot but isn't specified as one and causes issues later on when another disk is added"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

vm_clone produces VMs that are silently fragile to multi-disk attach + reboot (no bootDevices, vda-anchored grub)

3 participants